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Abstract 

We study some stochastic models of physical mapping of genomic 
sequences. Our starting point is a global construction of the process 
of the clones and of the process of the anchors which are used to map 
the sequence. This yields explicit formulas for the moments of the 
proportion occupied by the anchored clones, even in inhomogeneous 
models. This also allows to compare, in this respect, inhomogeneous 
models to homogeneous ones. Finally, for homogeneous models, we 
provide nonasymptotic bounds of the variance and we prove functional 
invariance results. 



Date: February 2, 2008. 

Abbreviated title: Invariance of physical mappings 

MSC 2000 subject classifications. Primary 60G55, Secondary 92D20, 60F17. 

Key words and phrases. DNA sequences, physical mapping, anchored is- 
lands, coverage processes, FKG inequality, genomic sequences, inhomoge- 
neous Poisson processes, invariance principle. 



Introduction 

The goal of the projects of genomic physical mapping is to reconstruct al- 
most completely the sequence of a genome, starting from a multitude of 
exactly sequenced fragments, which are called clones. One approach to the 
reconstruction of the overall positions of these clones in the complete ge- 
nomic sequence uses so-called anchors. These are short, exactly sequenced, 
portions of the genome which are assumed to appear only once in the full 
genomic sequence. An anchored clone is a clone which contains an anchor. 
In this paper, we assume that the positions of the anchors, hence of the 



anchored clones, are exactly known. Maximal connected unions of anchored 
clones are called islands or, more exactly, anchored islands, aka contigs. The 
complement of the islands is called the ocean. When suitably rescaled, the 
full genomic sequence is identified with (a portion of) the real line, the an- 
chors are identified with points, and the clones and the islands are identified 
with intervals. 

The overall quality of the reconstitution of a given genomic sequence depends 
obviously on the number of islands, on their length, and on the proportion 
of the sequence which is occupied by the ocean, among other characteristics 
of the project. One hopes that the islands are as few and as long as possible, 
and that the proportion occupied by the ocean is as low as possible. Arratia 
et al. (1991) introduced a stochastic model of physical mapping, where the 
positions of the right ends of the clones and the positions of the anchors 
are distributed according to independent homogeneous Poisson processes on 
the real line, and where the lengths of the clones are random, i.i.d., and 
independent of everything else. For this model, Arratia et al. computed the 
mean values of the three quantities of interest that we mentioned above. For 
related studies, see Lander and Waterman (1988), Ewens et al. (1991), and 
Grigoriev (1993). 

Motivated by the fact that actual genomic sequences do not fulfill the ho- 
mogeneity hypotheses which underly the stochastic model introduced by 
Arratia et al., Schbath (1997) and Schbath et al. (2000) extend this setting 
in two directions. In both papers, the independence properties of the model 
remain, but Schbath (1997) studies the case when the intensities of the Pois- 
son processes which generate the positions of the clones and the positions of 
the anchors may depend on their respective positions along the genome, and 
Schbath et al. (2000) study the case when the distributions of the lengths 
of the clones may depend on their respective positions along the genome. 
In these two wider contexts, these papers provide expressions of the mean 
value of the number of islands, of the mean value of the proportion occupied 
by the ocean, and, under an additional technical hypothesis, of the mean 
value of the length of the islands. 

In the present paper, we pursue the study of this class of models. As a first 
contribution, we consider the class of models where the Poisson process of 
the clones, the Poisson process of the anchors, and the distributions of the 
lengths of the clones can all be inhomogeneous simultaneously. To give a 
flavor of our results in this direction, we state proposition ^ below, which 
extends formulas of the papers mentioned above, for the mean value of the 
number of clones and for the mean value of the number of anchored clones 
which cover a point. 

To state proposition^ we introduce the measure c(dx) on the real line as 
the intensity measure of the Poisson process of the (right ends of the) clones, 
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the measure a(dx) on the real line as the intensity measure of the Poisson 
process of the anchors, and, for every x on the real line, the random variable 
L x as the length of a clone whose right end is at position x, and we refer to 
section ^ for more precise definitions of these objects. 

For every x on the real line, nc{x) denotes the number of clones which 
contain the point x, and n A {x) denotes the number of anchored clones which 
contain the point x. 

Proposition 1 (General case) (1) The random variable nc{x) follows 
the Poisson distribution whose mean value is given by the expression 



Here a{[z — t,z]) denotes the mesure of the interval [z — t, z] with respect to 
the measure a(dx). 

(3) The distribution of the random variable n^(x) is not Poisson. More 
specifically, either n^{x) = almost surely, or the variance of n^(x) is 
strictly greater than its mean value. 

In actual physical mapping projects, the condition that n^(x) = almost 
surely is never fulfilled. On the mathematical side, this would correspond 
to degeneracies such as the fact that L z ^ z — x almost surely for every 
z ^ x and/or the fact that the intensity of the anchors is zero on a suitable 
neighborhood of x. 

The homogeneous case is when c(dx) = ndx and a{dx) = adx for two given 
positive constants k and a, and when every L x is distributed like a given 
random variable L. The specialization of proposition ^to the homogeneous 
case is as follows. 

Corollary 2 (Homogeneous case) In the homogeneous case with param- 
eters k, a, and L, the mean values of nc{x) and n^{x) do not depend on 
the point x and are given by the expressions 



More importantly than the slight generalizations above, our second contribu- 
tion is to provide explicit formulas for the higher moments of these quantities 
in the general model with variable intensities. In the homogeneous case, our 
results imply, for instance, that the proportion of a large genomic sequence 
occupied by the ocean is asymptotically Gaussian, see theorem \K\ below. 





(2) The mean value of n A (x) is given by the expression 




E(n c ) = kE(L), E(n A ) = kE(L (1 - e~ aL )). 
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Theorem A (Homogeneous case) Consider the homogeneous case with 
parameters k, a, and L, and assume that L is square integrable. For any 
positive G, let the random variable Oq denote the measure of the intersection 
of the ocean with any interval of length G, for instance the interval [0, G], 
and let cj 2 {Og) denote the variance of Oq- 

(1) There exists a positive constant g < 1 such that K(Og) = qG for every 
nonnegative G. 

(2) There exists finite positive constants v and A such that, for every non- 
negative G, 

uG - \^a 2 (0 G ) ^vG. 

Hence <t 2 (Og) ~ vG when G — > oo. Furthermore, the function G i— ► <t 2 (Og) 
is convex, and <t 2 (OgO — V G + A — > when G — ► oo. 

(3) For every positive G, let Qg denote the random process, indexed by the 
real numbers ^ t ^ 1, and defined by 

e G (t) := (Oot-gGtyVlTG. 

When G — > oo ; i/ie process Qg converges in distribution to a standard 
Wiener process on the space of continuous functions on [0, 1], equipped with 
the metric of the uniform convergence. 

(4) The constants g, v and A above can be written explicitly as integrals 
which involve the parameters k, a, and the distribution of L. 

The starting point of our results is a global construction of the clones, the 
anchors, and the islands, using a single Poisson process. We expose this 
global construction in section ^ We provide alternative descriptions of this 
process, locating for instance the clones by their left ends instead of their 
right ends. A natural conjecture in this setting is that the homogeneous 
model would be the only one invariant by the symmetry of the real line, 
but we disprove this. In section we rewrite in our general setting various 
formulas due to Arratia et al. or to Schbath or to Schbath et al. Sectional 
provides explicit formulas for every moment of the proportion of the real line 
which is occupied by the ocean in the general case and provides rather sharp 
bounds of the variance in the homogeneous case. Finally, section|l]proves the 
invariance result stated in theorem lAl above, in the homogeneous case. On 
our way, we provide asymptotics of the variance when the number of clones 
is vanishingly small and we build comparison tools that yield effective upper 
and lower bounds in some inhomogeneous cases. 

Acknowledgements We wish to thank Sophie Schbath for an introduc- 
tion to this subject and for instructive discussions, Julien Michel for the 
positive association argument used in section 14.21 and an anonymous referee 
for a careful reading of the first version of this paper. 
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1 Global model 



In this section, we build the clones, the anchors and the islands from a single 
Poisson process. Sections 11.21 and 11.31 are not used in the rest of the paper 
and may be omitted on a first reading. 

1.1 Clones 

Let M denote the real line and M + := [0, +oo) the nonnegative half line. 
Let c(dx) denote the intensity measure of the Poisson process of the right 
ends of the clones. Assume that, when the right end of a clone is located 
at x, its length follows the distribution of a given random variable L x . We 
represent the clone which covers exactly the interval [x — t, x] of length t ^ 
by the point (x, t) in M x R + . The distribution of the clones is described by 
a Poisson process ConMx M + of intensity measure m, with 



In other words, C is a random subset of M X M + , which is almost surely 
locally finite, and such that the following holds. For every Borel subsets D 
and D' of M x R + such that D n D 1 is empty, the random number of points 
of C in D and the random number of C in D' are independent. Furthermore, 
for every Borel subset D of M. x M + , the number of points of C in D is a 
Poisson random variable of mean value m(D). 

In fact, the intensity measure m can be any Borel measure on R x M + with 
a locally finite first marginal c, given by 



That is, one assumes that c([—G,G]) = m([—G,G] x R + ) is finite for ev- 
ery finite positive G. The assumption that c is locally finite ensures that 
the distribution of L x is well defined, and given by the Radon-Nikodym 
derivative 



1.2 Alternative descriptions of the clones 

At first sight, it may seem rather arbitrary to locate the position of a clone by 
its right endpoint, rather than by its midpoint or by its left endpoint. In fact, 
these alternative descriptions are also characterized by Poisson processes, 
albeit possibly with different intensities. For instance, using the couple 
(y, t) to describe the clone [y, y + t] yields a Poisson process on R x M + of 



m(dxdt) := c{dx)F(L x G dt). 




F(L X E dt) := m(dxdt)/c(dx). 
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intensity measure m', with 



m'(dydt) :=: c{dy)¥(L' y £ dt). 

One obtains w! from m, or rather, one obtains c'(dy) and the distributions 
of the random variables L' y from c(dx) and from the distributions of the 
random variables L x , as follows. For any nonnegative test function <3>, the 
expected value of the sum over every clone [y,x] of $>(y,x) reads 



E 



<&(?/, x) J = J J m(dxdt) $(x— t, x) = J J m'(dydt) $>(y,y+t). 



\[y,x] clone 
In other words, one asks that 

J c(dx)E(*(x-L x ,x)) = y c'(dy)E(cI>(y,y + L;)). 

Since this equality holds for every test function this implies that c'(dy) 
and the distributions of the random variables L' y are given by 

r+oo 

c'(dy) = / c(dx)¥(L x ex-dy), 
Jy 

F(L' y €dt) = F(L y+t edt)c(t + dy)/c'(dy). 

Similar formulas give the intensity measures associated to the description of 
a clone by its midpoint and by its length, or by its two endpoints. 



1.3 On the (non) specificity of the homogeneous clones 

Based upon the preceding section, the reader might be led to believe that 
the homogeneous model is privileged with respect to the transformations of 
the intensity measure m(dxdt) into the intensity measure m'(dydt) and of 
m'{dydt) into m(dxdt). To wit, if the intensity c(dx) and the distributions 
of the random variables L x are invariant by the translations of the real line, 
so are the intensity c'(dy) and the distributions of the random variables L' y . 
Thus, in the homogeneous case, c(dx) = c'(dx) = ndx and the distributions 
of every L x and every L' y do coincide. 

Our goal in this section is to point out that there are other cases where 
the two intensity measures m and m! coincide. To build such examples, we 
need to introduce, for every x on the real line, the unit interval U x which is 
centered at x, that is, 

U x := [x- 1/2, x + 1/2). 

Let Bq denote the union of the intervals U2k for every integer k, and let 
B\ denote its complement. Let -uo(dx) denote a finite measure on Uq, and 
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ui(dx) a finite measure on U±. Let c(dx) denote the unique measure on R 
which is invariant by the translation x i— > x + 2 and whose restrictions to £7o 
and to U\ are uo(dx) and u\{dx), respectively. Thus, c = cq + c\ with, for 
i = and for i = 1, 



where both sums run over every integer k. In other words, c(dx) can be any 
locally finite measure on R, invariant by the translation x i— > x + 2, and the 
measure co(dx), respectively the measure ci(dx), denotes the restriction of 
c(dx) to Bq, respectively the restriction of c(dx) to B\. 

Assume finally that L x = 2 with full probability when x is in Bq, and that 
L x = 4 with full probability when x is in B\. Since L x is always an even 
integer, the endpoints of a given clone are either both in Bq or both in B\. 
Using this remark, one can check that m = m! . Besides, the process which 
locates the clones by their midpoint is given by a similar intensity measure, 
choosing with full probability the length 4 when the midpoint belongs to Bq, 
and choosing with full probability the length 2 when the midpoint belongs 
to B 1 . 

In the example above, the distributions of the lengths are discrete, hence 
the measure m(dxdt) is singular with respect to the Lebesgue measure. 
However, the same idea can be adapted to produce examples where m(dx dt) 
is absolutely continuous. To see this, introduce the Poisson process which 
describes a clone [y, x] by its endpoints (y, x), and assume that the intensity 
measure m* of this Poisson process is 



m*(dydx) : = dydx^ l{(y,x) G U 2 k x + l{(y, x) € U 2 k-i x U 2 k+3,}, 



where the sum runs over every integer k. In words, the left endpoints and 
the right endpoints of the clones both have homogeneous intensity measures, 
and both endpoints of a clone belong to Bq or both endpoints belong to B\. 
Furthermore, given that the left endpoint y belongs to Bq, the right endpoint 
x is uniformly distributed over the next unit interval of Bq to the right of 
y, that is, over the connected component of Bq which contains y + 2. Given 
that the left endpoint y belongs to B\, the right endpoint x is uniformly 
distributed over the second next unit interval of B\ to the right of y, that 
is, over the connected component of B\ which contains y + 4. 

In this new example, the measure m(dxdt) is as follows. The intensity c(dx) 
is the Lebesgue measure. The length L x is uniformly distributed over U x _2k 
when x is in U2k+2, and L x is uniformly distributed over U x -2k+\ when x 
is in U2k+3- The support of the distribution of L x is a unit subinterval of 
the interval [1, 3] when x is in Bq, and it is a unit subinterval of the interval 
[3,5] when x is in B\, hence the distribution of L x cannot be the same for 




k 



k 
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every x on the real line. Finally, m = m! because m* is invariant by the 
symmetries of the real line, since these exchange the left endpoints and the 
right endpoints of the clones while leaving their lengths unchanged. 

1.4 Anchors 

In this section and in the rest of the paper, we come back to the (x, t) Poisson 
process of intensity m, which represents the clones by their right endpoint 
and by their length. 

The anchors are described by a Poisson process A on the real line, with 
intensity a(dx), independent of the Poisson process C of the clones which we 
defined in section 11.11 Thus, for every Borel subset D of the real line, the 
number of anchors in D is a random variable whose distribution is Poisson 
with mean value a(D), and the number of anchors in the Borel sets D and 
D' are independent random variables as soon as D n D' is empty. 

For every subset D of the real line, let 1(D) denote the cone of influence of 
DinRx R + . This is the set of clones (x,t) which become anchored clones 
when every point of D becomes an anchor. Thus, 

1(D) := {(x, f)elxR+; [x - t, x] D D ^ &}. 

For every measurable D, the process Cd := C D 1(D) of the clones that are 
anchored by D is deduced from C by erasing some clones, hence each Co 
is indeed a Poisson process whose intensity measure mo on R x R + is the 
restriction of the original intensity measure m to the set 1(D), that is, 

m D (dxdt) := l{(x,t) £ 1(D)} m(dx dt). 

For every locally finite subset D of the real line, let F D denote the condi- 
tioning of P by the event {C = D}. Finally, let Cj± denote the process of the 
anchored clones, that is 

C A ■= {( x , t) G C ; [x - t, x] n A + 0} = C n 1(A). 

1.5 Clones+anchors 

One can, and we shall, simultaneously generate the processes C, A and 
C_4 from a unique Poisson process, as follows. Let M := IR + U {*}, where * 
denotes any point which is not in M + . We endow the set M with the smallest 
cr-algebra which contains the Borel sets of M + and the singleton {*}. We 
endow the set R x M with the product cr-algebra of the Borel cr-algebra of 
R and of this cr-algebra of M. Finally, we introduce a Poisson process on 
IxM with intensity 

g(dxdt) := m(dx dt) + a(dx) 5*(dt). 
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We call this Poisson process the global process. The point (x,t) with t in 
IR + represents the clone [x — t, x] and the point (x, *) represents the anchor 
at x. The restriction of the global process to the domain R x R + yields the 
process of the clones described in section H~T1 since its intensity, which is the 
restriction of g to R x R + , is m(dxdt). Likewise, the projection (x, *) \—* x 
on the real coordinate of the restriction of the global process to the domain 
Rx{*} yields the process of the anchors described in section ITTH since its 
intensity is a(dx). Finally, the process of the clones and the process of the 
anchors are indeed independent since they are realized as the restrictions of 
the global Poisson process to the domains R x R + and R x {*}, which are 
disjoint subsets of R x M. 

Proposition |3] below and proposition ^ and corollary [2] in our introduction 
follow from the construction above. The proofs are simple adaptations of 
the proofs given by Arratia et al., Schbath, and Schbath et al., hence we 
omit them. 

Proposition 3 (General case) With respect to P, A and C are indepen- 
dent Poisson processes. For every locally finite D, with respect to F D , Co is 
a Poisson process. With respect to P, Cj± is not a Poisson process. 

1.6 Ocean 

Recall that the ocean O is the complement of the union of the anchored 
islands. For every Borel set D of the real line, let 0(D) denote the measure 
of O n D. For every positive real number G, let Oq := O([0, G]). For every 
Borel set Z of the real line, let 

r(Z) :=F(Z C O). 

For every n ^ 1 and every real numbers Zi, . . . , z n , let 

r(zi, ...,z n ):= r({zi, . . . , z n }) = F(zi G O, . . . , z n £ O). 

For instance, r(z) is the probability that z belongs to no anchored clone. 
Hence r(z) may depend on z but r(z) corresponds to r(0) if the process of 
the clones and the process of the anchors are both shifted by z. Lemma E] 
below stems from the definitions. 

Lemma 4 For every Borel set D of the real line and every integer n ^ 1, 




For instance 
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2 First moments 



This section is mainly a rephrasing of results of Arratia et al. and Schbath 
et al. Our only contribution here is to include both inhomogeneities simul- 
taneously in the results, namely, the inhomogeneities of the lengths of the 
clones on the one hand, and the inhomogeneities of the positions of the right 
ends of the clones and of the anchors on the other hand. We are interested 
in r(z), which describes locally the mean value of the proportion of the real 
line which is occupied by the ocean. 



Lemma 5 Let J(x, y) denote the probability of the event that two points x 
and y such that x ^ y belong to no common clone. Then 



(r+oo 
-I V(L 



J(x, y) := exp - / ¥(L t > t - x) c(dt) . 



Caution: we renamed J(x, x + t) the expression J(x, t) of the papers men- 
tioned above. 

Lemma 6 For every z, the joint law of the positions x and y of the an- 
chors which are the closest of z to the left and to the right, respectively, is 
A~(z, dx) A + (z, dy), where 

A~(z, dx) := A(x, z) a{dx) et A + (z, dy) := A(z, y) a(dy). 

For every points x ^ y, we use the notation 

A(x, y) := exp ^- J a(dt)^ . 
Theorem B (Schbath et al.) For every z, 

r(z) = [ J ±AI^y) A{Xj y) a(dx) a(dy) . 

Jx^y J{X,y) 



The contribution of the intensity measure a in r(z) corresponds to the prod- 
uct A~(z,dx) A + (z,dy). 

A quick look at the ratio of the functions J in the integral above could lead 
to the erroneous conclusion that r(z) is not well defined when J(x,y) is 
not always positive. (One knows that J(x, y) is positive when, for instance, 
the random variables Lt are uniformly integrable, and c(dt) is uniformly 
bounded, that is, when there exists a finite k + such that c{dx) ^ K + dx.) 
In fact, one can show that this ratio is at most 1 for any intensity c(dt) and 
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any distributions of the random variables Lf, hence the formula for r(z) in 
theorem iBl is always valid. 

We recall that, in the homogeneous case, the process of the clones has 
constant intensity c(dx) = ndx, the lengths of the clones are i.i.d. and 
distributed like a random variable L, and the process of the anchors has 
constant intensity a(dx) = adx. 



Corollary 7 (Arratia et al.) In the homogeneous case with parameters k, 
a and L, r{z) = g does not depend on z and its value is 



Q ■-- 



o Jo 



J(u + v) 



Here, J(u) is the probability that an interval of length u is not covered by 
any unique clone, hence 



J(u) := exp (^-k J P(L ^ t) d?j 



When, furthermore, L = t with full probability for a given positive real 
number £, Arratia et al. deduce from this the value of g as a function of £, 
k and a. 

One gets the expression of g in corollary [7| from r(z) in theorem [Bl using 
the change of variables u = z — x, v = y — z. 



3 Higher moments 



Higher moments of the quantities introduced above involve functionals of 
the processes that depend on more than one point. We first describe the 
computation of the variance of the proportion of the real line which is occu- 
pied by the ocean in the general case, then we consider the higher moments 
in the general case, and finally we prove precise asymptotics of the variance 
in the homogeneous case. 



3.1 Variance of the ocean proportion 

Recall that r(z, z') is the probability that neither z nor z' are covered by 
anchored clones. Let ro(z, z'), respectively r\(z, z'), respectively ^(z, z'), de- 
note the probability of the same event, when the number of anchors between 
z and z' is 0, respectively 1, respectively 2 or more. One can decompose 
each of these events, according to the position of the first anchor to the left 
of the interval (z, z'), which we call x in the integrals below, to the position 
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of the first anchor to the right of (z,z'), which we call y in the integrals 
below, and to the positions of the leftmost and rightmost anchors, if any, in 
the interval (z, z'), which we call s and t in the integrals below. 

Thus r(z, z') = ro(z, z') + r±(z, z') + r 2 (z, z') with, for z ^ z', 
r (z,z') := / J(x\ z,z' \y)B(dx,dy), 

J x^.z^.z' f^.y 

r\(z,z') := / J(x | z | s) J(s | z' \ y) a(ds) B(dx, dy), 

r 2 (z,z') := / J(x\z\s)J(t\z'\y)B(dx 1 ds)B(dt,dy). 

J z^zsCsSCtSjz'^i/ 

We mention that ri(z,z') is defined as an integral of dimension i + 2, for 
i = 0, 1 or 2. We used the following notations. The two dimensional measure 
B is defined on the subset x ^ y of M x M by the formula 

B(dx, dy) := A(x, y) a(dx) a(dy). 

For any x ^ z ^ z' ^ y, 

, J(x,z)J(z',y) J(x,z)J(z,y) 

J(x\z,z \y) := , J(x\z\y) := r . 

J(x,y) J(x,y) 

The quantities involved in the definitions above have the following inter- 
pretations. First, l{x ^ z ^ y} B(dx,dy) is the distribution of the couple 
formed by the positions of the rightmost anchor to the left of z and of the 
leftmost anchor to the right of z. Second, J{x \ z \ y) is the probability that z 
is not covered by an anchored clone when the closest anchor to the left of z 
is at x and the closest anchor to the right of z is at y. Finally, J(x \z,z' \ y) 
is the probability that z and z' are not covered by anchored clones when the 
closest anchor to the left of z is at x, the closest anchor to the right of z' is 
at y, and when there is no anchor between z and z' . Schbath's formula in 
our theorem iBl reads 



r(z) = / J(x | z | y) B(dx, dy). 

Jx^z^y 

If one forgets the condition that s ^ t in the definition of r 2 {z, z'), one gets 
the product of the integrals over (x,s) and over (t,y), which are r(z) and 
r(z'), respectively. This implies our lemma |H] below. 

Lemma 8 For any z ^ z' , r2(z,z') = r{z)r(z') — r^{z,z') where the term 
r^(z,z') is nonnegative and is 

r 3 (z,z'):= / J(x\z\s) J(t\ z' \y) B(dx,ds) B(dt,dy). 

J x^.z^.s, ti^z'^y, s^t 
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As a consequence, the variance a 2 (Oc) of Oq is 

G G 

<? 2 {0 G )= [ [ (ro + n-r 3 )(z,z?)dzdz?. 
Jo Jo 

3.2 Higher moments of the ocean proportion 

As mentioned above, one can adapt the technique used in the last section 
to study the mean value of any power of Oq. For instance, 

E(C%) = [ G [ G f G r ( j2j z > } z ") dz dz > dz » 
Jo Jo Jo 

Thus, assuming for instance that n = 3, one has to compute the n-point 
function r(z,z',z"). First, one can assume by symmetry that z ^ z' ^ z" . 
Let x denote the position of the rightmost anchor to the left of z, and y the 
position of the leftmost anchor to the right of z". Let s and t denote the 
positions of the leftmost and rightmost anchors in the interval (z,z'), and 
s' and t' the positions of the leftmost and rightmost anchors in the interval 
(z f , z"), if these exist. 

Then r(z,z',z") is n\ = 6 times the sum of 3 n_1 = 9 terms Ti i'{z, z' , z"). 
Each term r iti '(z, z' , z") corresponds to the number i = 0, 1 or 2 of anchors 
to be considered in the interval (z, z') and to the number i' = 0, 1 or 2 of 
anchors to be considered in the interval (z',z"). Namely, no anchor at all, 
or a unique anchor, denoted by s or by s', or two extremal anchors, denoted 
by s and t, or by s' and t' . 

To take an example, consider the case i = 2 and i! = 1. This yields 
r 2,i(z, z' , z") as the integral 

/ J(x\z\ s) J(t I z' I s') J(s' I z" I y) a(ds') B(dx, ds) B(dt, dy), 
Jd 2 ,i 

where the domain of integration Z?2,i has dimension 5 and is defined by the 
inequalities 

x ^ z ^ s ^ t ^ z' ^ s' ^ z" ^ y. 
Likewise, if i = and i! = 2, ro^(z, z', z") is the integral 

[ J(x \z,z'\ s') J(t' I z" I y) B(dx, ds') B(dt', dy), 

JDq i2 

where the domain of integration Z?o,2 has dimension 4 and is defined by the 
inequalities 

x ^ z, z' ^ s ^ t' ^ z" ^ y. 

More generally, E(0^.) is the integral of the n-point function r(zi, . . . ,z n ) 
on the domain [0, G] n with respect to the Lebesgue measure. For every 
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n-tuple z\ ^ • • • ^ z n , r(zi, . . . , z n ) can be decomposed as a sum of 3™ _1 
contributions. Each of these contributions corresponds to the event that 
each interval [zk,Zk+i] contains no anchor at all, or a unique anchor, or at 
least two anchors. 



3.3 Variance in the homogeneous case 

In this section, we study the homogeneous case, when the intensity measures 
are a(dt) = adt and c(dx) = ndx, and the distribution of the length L x 
of a clone does not depend on its position x and is the distribution of a 
random variable L. We recall that the distribution of the global process is 
left invariant by the action of the translations. This implies that r{z) = g 
for every z, where the value of g is given in corollary [7| Hence, 

E(0 G ) = Gg. 

Since (z, z') i— ► r(z, z') — r(z) r(z') is a symmetric function, a 2 (Oc) is twice 
an integral over z' z. Likewise, the invariance by the translations implies 
that r(z, z') = r(Q,z' — z) for every z and z' . Introducing ?i(z) := rj(0, z), 
one is left with twice some integrals of the functions fi(z) over z in [0, G], 
namely 

a\0 G ) = 2 [ (G-z) (r (z) + n{z) - f 3 (z)) dz. 



o 



The values of the quantities ri{z) for every nonnegative z are 

r (z) = I « 2 e-^ + -> mm dxdy , 

Jx, y ^o J{x + y + z) 

n(z) = [ a? J y J(t \ f ~ t] dxdydt, 

Jx,y^0,0^z J{x + t)J(z-t + y) 

f 3 (z) = [ a 4 e - a( , w+f) m m j (y) dxdydsdL 

Jx,y,s,t^0,s+t^z J{X + t) J(S + y) 

We mention that fo(z), respectively fi(z), respectively f%{z), is defined as 
an integral of dimension 2, respectively 3, respectively 4. 

Using the fact that the function x \—* J(x) is nondecreasing, one can bound 
each fi{z) as follows: 



f (z) < e~ az j(a), 
fi(z) ^ aze~ az j(a) 2 , 
f 3 (z) < (l + az)e- az j(a) 2 , 



with the notation 



j(a) := / ae ax J(x) dx. 
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To prove the upper bound of fo(z), one uses the fact that J(y) ^ J(x+y+z), 
and one performs the integration of the upper bound. Likewise, to prove 
the upper bound of f±(z), one uses the facts that J(t) ^ J[x + t) and 
J[z — t) ^ J(z — t + y), and one performs the integration of the upper 
bound. Finally, to prove the upper bound of f^(z), one uses the facts that 
J(t) ^ J(x + t) and J(s) ^ J(s + y), and one performs the integration of 
the upper bound. In this last case, this yields 

f 3 (z) < j{a) 2 f a 2 e- a{s+t) dsdt, 

J s,t^0,s+t^z 

and the last double integral is indeed (1 + otz) e~ az . 

Since J(x) ^ 1, j(a) ^ 1. Furthermore, the limit of J at infinity is 1, hence 
fo(z) ~ e~ az j(a) 2 at infinity. Let 

f-G 



a 2 (G):= [ 2{G-z)f l {z)dz. 
Jo 



From the bounds on the three functions which are stated above, it is not 
difficult to prove that, when G — ► oo, 

a 2 (G) = u l G-X l + n(G), 

where Ti(G) = o(l) for i = 0, 1 and 3. More specifically, these bounds imply 
that the numbers and Aj, defined as 

r+co f+oo 

Vi\= / 2f,i(z)dz, Xi := / 2zfi(z)dz. 
Jo Jo 

are indeed finite and positive, and simple computations show that 

e + OO 



r+oo 

Ti (G) := / 2(z-G)f t (z)dz. 
Jg 



Introduce t(G) := t$(G) + T\(G) — t%{G). Since each Ti(G) is nonnegative, 
|t(G)| is at most the maximum of tq(G) +t±(G) and Ts(G). Since j(a) ^ 1, 
our bounds on the three functions fj imply that 

f+OO 

\t(G)\ ^ 2{z — G)(l+a z) e" az dz. 
Jg 

Performing the integration, one gets 

|r(G)| < 2a" 2 (3 + aG)e" aG . 
Finally, when G ^ oo, t(G) = 0(Ge~ aG ). 

Assume now that L ^ £ almost surely, for a finite £. This means that the 
intensity measure of the global Poisson process on R x M + puts no mass on 
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the set R x {£, +00). Assume that z and z' are such that \z — z'\ > £. Then 
I(z) n I(z') contains only clones (x, t) such that both points z and z' belong 
to [x — t,x], hence in particular, such that t > i. Since I(z) n I(z') is a 
subset of R x (£, +00), its intensity measure must be zero. Thus, the events 
{z £ 0} and {V £ O} are in fact measurable with respect to the truncated 
cones of influence I(z) n (R x [0,^]) and I(z') n (R x [0,1]), respectively. 
Since these two subsets of R x R + are disjoint, {z £ 0} and {z' £ O} are 
independent events. 

Finally, if L ^ I almost surely, r{z,z') = r(z)r(z') as soon as z and z' are 
such that \z — z'\ > £, hence ro(z) + V\{z) — ^z{ z ) = f° r every z > £, and 
t(G) = for every G^t. 

Proposition El below summarizes the results of this section. 

Proposition 9 (1) Let v := vq + v\ — v% and A := Ao + Ai — A3. When 
G — > 00, 

a 2 (0 G ) = uG-X + o(l). 
(2) Assume that L ^ I almost surely for a finite I. Then, for every G £, 

a 2 (O c ) = uG-\. 

4 Functional invariance in the homogeneous case 

Our main task in this section is to prove that v is positive, that is, not zero. 
We do this, first, in the limit k — > of a vanishing number of clones, then in 
the general case. Our techniques also yield upper and lower bounds of the 
mean value and of the variance of Oq when the intensities are not constant. 
Finally, we prove the functional invariance result of theorem lAl 

4.1 Variance for vanishing clones 

Proposition 10 (Homogeneous case) Fix the distribution of L and the 
value of a. Then, if k is small enough, v is positive. More precisely, when 

K — ► 0, 

v = a~ 2 E(y?(a L)) k + o(k), 

where the function x 1— * (f(x) is explicit, positive on x > 0, and given by the 
formula 

(p(x) :=x-l + e~ x (l-x 2 /2). 

Proof If k = 0, j(a) = 1 and fi{z) = r*(z), with 

r*(z) :=e~ az , r{{z) := a z e~ a z , r*(z) := (1 + a z) e~ az , 



16 



hence 7q + r\ — r| is identically zero. (Besides, when k = 0, Oq is almost 
surely zero.) We now show that the first derivative of v with respect to k 
at k = + is positive. 

When k = o(l), J(x) = 1 — kH(x) + o(k) with 



H(x) := f + P(L ^ t)dt. 

J X 



I X 

This implies that fi(z) = r*(z) + KSi(z) + o(k), for some explicit functions 

r+00 

Si(z). Introducing Wi : = / Si(z)dz and w := wq + w\ — W3, one gets 

Jo 

v = kw + o(k). For instance, 



wo 



= f a 2 e- a{x+y+z) {H(x + y + z)-H(x)-H(y)}dxdydz, 

Jx,y,z^0 



and similar expressions of wi and ^3 obtain. After some tedious but simple 
computations, one gets 

w = h 2 - 2h , w\ = 2hi - 4h , w 3 = 2h 2 - 6h , 

where, for every nonnegative integer n, the value of h n is given by 



h n := / ^±L e -«x H(x)dx. 
Jo n - 



Summing up these three contributions yields w = 2h\ — h 2 . Converting 
everything back in terms of the distribution of L, one finally gets 

w = a- 2 E(ip(aL)), 

where ip is given in the statement of the proposition above. It happens 
that ip(x) := e x <p(x) defines a function ip such that Y>(0) = and whose 
derivative ifr'(x) = x (e x — 1) is obviously positive for every positive x. Thus 
ip(x) is positive for every positive x, and w is positive for every distribution 
of L, except in the degenerate case when L = almost surely. This proves 
that v is positive for small values of k. □ 

Remark 11 Other limiting cases are possible. Recall that E(Og) = qG 
for every nonnegative G, and that <j 2 (Og) ~ vG when G — > 00. 

1. If E(L 3 ) = o(l), then v ~ \ a kE(L 3 ). 

2. If At = o(l), then (1 - q) ~ KE(Le- aL ). 

3. If E(L) = o(l), then (1 - q) ~ kE(L). 

Note that this last result does not depend on the value of a. □ 
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4.2 Positive dependence 

Proposition 1121 below deals with possibly inhomogeneous processes. 

Proposition 12 (General case) For every Borel sets Z and Z' , 

F(Z UZ'cO)) F(Z C O) F(Z' C O). 
In particular, r(z, z') ^ r(z) r(z') for every z and z' . 

Coroilarvll3lis a direct consequence of this proposition and of the expression 
of a 2 (0 G ) hi section 1331 

Corollary 13 (Homogeneous case) For every nonzero intensities k and 
a and every nonzero L, the constants v and A are positive and the function 
G i ^ t(G) is nonnegative. In particular, for every G, 

uG-X^ a 2 (O c ) < vG. 

Hence a 2 {0 G ) ~ vG when G — ► oo. Furthermore, the following properties 
hold. The function G i— ► a 2 (0 G ) is increasing and convex. When G —>■ 0, 
o- 2 (O g ) ~ g(l - q)G 2 . WhenG^oo, a 2 {O a ) = v G - A + o(l). 

Proof of corollary 1131 As regards u, recall from section 13.31 that, in the 
homogeneous case, 

r+oo 

v= I 2(r(0,«) - Q 2 )dz. 
Jo 

Since < g < 1, r(0, 0) = r(0) = g > g 2 . Furthermore, one can deduce from 
section |21 an expression of r(0,z) from the formulas which give ri(z,z') for 
i = 0, 1 and 2. The integrals involved are continuous with respect to z and z' 
because the functions J involved in these integrals are, and because obvious 
domination properties hold. Finally, r(0, z) > g 2 for every nonnegative z in 
a neighborhood of 0, and r(0, z) ^ g 2 for every nonnegative z. This implies 
that v > 0. 

The proofs that A is positive and that t(G) is nonnegative are similar. 

The equivalent of a 2 (Oc) when G — ► stems from the fact that r(0, z) — > g 
when z — > and from the exact formula 

a 2 (0 G )= [ G 2(G-z)(r(0,z)-g 2 )dz. 
Jo 

Finally, this formula and the fact that r(0, z) ^ g 2 also yield the fact that 
the function G t—* o~ 2 (0 G ) is increasing and convex, since the derivative of 
this function is 

f G 

/ 2(r(0,z) - g 2 )dz. 
Jo 

□ 
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Proof of proposition 1121 For any Borel set Z, {Z C O} is a nonincreasing 
event, with respect to the global Poisson process introduced in section IT31 
To see this, note that, if one adds some anchors and/or some clones to a 
given configuration, the union R\0 of the anchored islands does not decrease 
hence the indicator function of the event {Z C O} does not increase. Thus, 
our proposition is a direct consequence of the Fortuin-Kasteleyn-Ginibre 
(FKG) inequality 

P(illi') ^ ¥(A)F(A'), 



applied to the nonincreasing events A := 
Roy (1991) for instance. 

4.3 Bounds in the general case 



{Z C O} and A' := {Z' C O}, see 

□ 



In the inhomogeneous case, minimal assumptions on c(dx) and a(dx) yield 
upper and lower bounds on E(Oc) and o~ 2 (Og), as we now show. In this 
section, we assume that the intensities of the processes of the clones and of 
the anchors are uniformly bounded. Hence, a(dx) and c(dx) are absolutely 
continuous with respect to the Lebesgue measure and there exists finite 
positive constants a± and k± such that 

a_ dx ^ a(dx) ^ a+ dx, k_ dx ^ c(dx) ^ n+ dx. 

We assume furthermore that the lengths L x of the clones are uniformly 
stochastically bounded from above and from below. This means that there 
exists nonnegative random variables L± such that L + is integrable, such 
that L_ is not almost surely zero, and, such that, for every x and t, 

P(L_ > t) < F(L X > t) < P(L+ > t). 

In particular, the family (L x ) x must be uniformly integrable. 

Proposition 14 The assumptions above imply that there exists positive 
constants g± < 1 and finite positive constants v± such that, for every G, 

q-G ^E(0 G ) < q+G, v-G ^o- 2 (0 G ) ^v+G. 

In these inequalities, Q- corresponds to the homogeneous case of parameters 
K+, a + and L + , and g+ to the homogeneous case of parameters a_ and 
L_. As regards the variance, the dependence is not so straightforward, at 
least the dependance that our techniques yield. The parameter v + that we 
exhibit depends on a_ alone, a result which may seem surprising, and the 
parameter depends on k + , g + , and 
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Proof The bounds on E(Og) would follow from the fact that 



Q- < r ( z ) < 0+, 



for any 2 and for positive g± < 1. Such bounds on r(z) themselves stem 
from the fact that the distribution of the ocean, as a random subset of the 
real line, is nonincreasing with respect to the intensities of the processes of 
the clones and of the anchors. Hence, by a coupling argument, the value of 
E(Og) lies between its value for the homogeneous processes of densities a+ 
and k+ on the one hand, and a_ and K- on the other hand, the distributions 
of the lengths L x being fixed. 

We now examine the influence of the distributions of the lengths. Once 
again by a coupling argument, the uniform replacement of the distributions 
of the lengths L x by the distribution of L + yields longer clones, hence longer 
islands, hence a stochastically smaller ocean. This proves the lower bound 
of E(Og). Comparison with L_ yields the upper bound. 

Our proof of the lower bound of <j 2 (Og) goes as follows. One knows that 



and that the expression r(z,z') — r{z)r(z') is nonnegative for every z and 
z' . Assume that there exists positive 5 and e such that, for every z and z' 
such that \z — z'\ ^ e, 



The lower bound of a 2 (Oc) would follow. Now, for every z ^ z 1 , if z' is in 
O and if there is no right end of clone in [z, z'], then z is in O. Hence, 



r(z') = r(z, z') + ¥(z $ O, z £ O) < r(z, z) + P(B), 
with B := {Cn([z,z'\ x R+) / 0}. By definition of the intensity of the 



This in turn shows the desired inequality for z ^ z' and z' — z small enough. 

As regards the upper bound, it is enough to bound from above the integrals 
of ro(z, z') and r±(z, z'), since rs(z, z') is nonnegative. In the expression of 
ro(z, z'), for every fixed values of x and y, J(x \z,z' \ y) is a nonincreasing 
function of the distributions of the lengths L x and of the intensity of the 




r(z, z') — r(z) r(z') ^ 5. 



Poisson process C, 



P(B) = 1 - e^ c([2 ' z ' ]) ^ c([z, z'\) ^ k+ (z' - z). 



Since r(z') ^ Q- and r(z) ^ g + , this proves the lower bound 



r(z, z') - r(z) r(z') ^ (1 - g + ) Q- - k + (z' - z). 
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clones, since having more clones and longer clones only makes the ocean 
smaller. Thus ro(z, z') is bounded from above by its value when one replaces 
c(dt) by k_ dt and the distribution of every L x by the distribution of L_. 
Likewise, the interpretation of B(dx,dy) as the joint distribution of the 
positions of the rightmost anchor to the left of z and of the leftmost anchor 
to the right of z, and a coupling between two processes of anchors with 
comparable intensities, show that the anchors become stochastically more 
distant from z when one replaces a(d£) by the smaller intensity a_ dt. Hence 
the probability that z is not covered by an anchored clone cannot decrease. 
Thus, replacing a{dt) by a_ dt cannot make ro(z, z') decrease. 

Finally, the contribution of tq in the value of a 2 (Oc) is bounded from above 
by its value in the homogeneous case which uses the values a_, k_ and 
L_, that is, for instance, by 2G/a_. Likewise, the contribution of r\ to the 
value of a 2 (Oc) is at most 2G/a_. This yields the desired upper bound 
with u + := 4/a_. □ 

Remark 15 Alternatively, when L z ^ £ almost surely and for every z, recall 
from the end of section EUfl that r(z, z') = r(z) r(z') as soon as \z — z'\ > £, 
hence a 2 (Oc) is at most the area of the part of the square [0,G] 2 inside the 
diagonal strip \z — z'\ ^ I, that is, at most 2£G — £ 2 when G ^ £, and <j 2 (Og) 
is at most G 2 for every G. Hence a 2 (Oc) ^ 2£G for every G. □ 

Remark 16 One can adapt the proofs in this section to some cases when 
the intensities of the clones and of the anchors are zero in some places, as 
long as the intensities stay bounded from below on regions which are spread 
out enough. □ 

4.4 Convergence in distribution 

We first explain how one could prove the convergence of the moments by 
elementary techniques, then we show that general invariance results apply, 
which yield directly the desired convergence. 

4.4.1 Method of moments 

Assume first that L ^ £ almost surely. Then, a crucial remark from the end 
of section EH1 is that the events {Z C O} and {Z' C 0} are independent as 
soon as the distance between every point in Z and every point in Z' is at 
least £. Furthermore, 
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If Z = Z' U Z" with \z' - z"\ ^ I for every z' G Z' and every z" G Z", one 
gets E(tt(Z)) =E(7r(Z'))E(7r(Z ,/ )). 

For instance, if n = 3, every nontrivial partition of Z includes at least one 
singleton hence K(tt(Z)) is zero except when all the distances between the 
nonempty subsets of Z are at most £. Ordering the points z, z' and z", we 
are left with the domain 

z ^ z' z + £, z z" ^ z' + £, 

whose volume is at most £ 2 G. Hence E((0 G - qG) 3 ) = 0(G). 

If n = 4, the only difference with the n = 3 case is due to the partitions of 
Z into two pairs Z' and Z" . These contribute to the result even when the 
distance from Z' to Z" is large. Every such E(ir(Z')) and E(vr(Z")) is 0(G), 
hence E((O g - 0G) 4 ) is 0(G 2 ). 

Likewise, for every positive integer k, the moments E((O g — gG) 2k ) and 
E((O g - QG) 2k+1 ) are both 0(G fe ). 

One can also compute the asymptotics of the moments of O g as G — > oo. 
To do this, one starts from the expression of E((O g — gG) 2n ) as the integral 
of E(7r(Z)) over the points Z in [0,G] 2n . When there exists a partition of 
Z into two parts Z' and Z" at a distance at least £, K(tt(Z)) is the product 
E(ir(Z')) E(tt(Z")). The remaining points Z span a volume in [0, G] 2n which 
is o(G n ), hence they contribute to a vanishing part of the asymptotics. 

This yields recursions between the asymptotic moment of degree 2n and the 
asymptotic moments of even degrees at most In — 2. One can deduce from 
these recursions the convergence of the moments of (0 G — qG)/ V~G to the 
moments of a Gaussian random variable. 

Finally, one could adapt this strategy to the case where L is unbounded, 
thus reaching the same conclusion. 

4.5 Direct method 

A stronger conclusion obtains directly from classical results by Doukhan et 
al. (1994), for every square integrable L. To see this, introduce for every 
integer n, the random variable 

X n :=0([n,n + l])-g. 

Let T n denote the cr-algebra generated by the collection (Xi)i^ n , and let 
Q n denote the cr-algebra generated by the collection The sequence 

(X n ) n is generated by the action of the shift 

■d : (x,t) ^ (x+ l,t), 
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on R x M, since X n = Xq o -d 11 for every integer n. The strong mixing 
coefficients a n associated to the stationary sequence (X n ) n are defined, for 
any integer n ^ 0, by 

a n := sup{P(5 n B') - P(B) P(B') ; B e f , B' e G n }. 

Since |Xo| ^ 1 almost surely, the condition in Doukhan et al. (1994) reduces 
to the summability of the series of general term a n . Neglecting the influence 
of the anchors does not decrease the value of a n . Thus a n ^ P(B n ), where 
B n is the event that at least one clone covers both points and n. One can 
bound each P(B n ) as follows: 

/•+00 

F(B n ) = 1 - J(n) ^ / nF(L^t) dt. 

Jn 

This shows that the sequence of general term ¥(B n ) is summable as soon as 
L is square integrable. (In fact, this sequence is summable if and only if L is 
square integrable, proof omitted.) This shows that the functional invariance 
stated in theorem \K\ holds, at least for the processes Qq such that G is an 
integer. The general case is an easy consequence, since Oq depends on G in 
a monotone way. 

Equivalently, one can write directly Oq as 

O g = qG+ [ G Y x dx, 
Jo 

where the stationary centered family Y x := l{x £ 0} — g is indexed by the 
real numbers x. The same conclusion obtains. 
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